420 research outputs found

    Enriching information extraction pipelines in clinical decision support systems

    Get PDF
    Programa Oficial de Doutoramento en Tecnoloxías da Información e as Comunicacións. 5032V01[Resumo] Os estudos sanitarios de múltiples centros son importantes para aumentar a repercusión dos resultados da investigación médica debido ao número de suxeitos que poden participar neles. Para simplificar a execución destes estudos, o proceso de intercambio de datos debería ser sinxelo, por exemplo, mediante o uso de bases de datos interoperables. Con todo, a consecución desta interoperabilidade segue sendo un tema de investigación en curso, sobre todo debido aos problemas de gobernanza e privacidade dos datos. Na primeira fase deste traballo, propoñemos varias metodoloxías para optimizar os procesos de estandarización das bases de datos sanitarias. Este traballo centrouse na estandarización de fontes de datos heteroxéneas nun esquema de datos estándar, concretamente o OMOP CDM, que foi desenvolvido e promovido pola comunidade OHDSI. Validamos a nosa proposta utilizando conxuntos de datos de pacientes con enfermidade de Alzheimer procedentes de distintas institucións. Na seguinte etapa, co obxectivo de enriquecer a información almacenada nas bases de datos de OMOP CDM, investigamos solucións para extraer conceptos clínicos de narrativas non estruturadas, utilizando técnicas de recuperación de información e de procesamento da linguaxe natural. A validación realizouse a través de conxuntos de datos proporcionados en desafíos científicos, concretamente no National NLP Clinical Challenges(n2c2). Na etapa final, propuxémonos simplificar a execución de protocolos de estudos provenientes de múltiples centros, propoñendo solucións novas para perfilar, publicar e facilitar o descubrimento de bases de datos. Algunhas das solucións desenvolvidas están a utilizarse actualmente en tres proxectos europeos destinados a crear redes federadas de bases de datos de saúde en toda Europa.[Resumen] Los estudios sanitarios de múltiples centros son importantes para aumentar la repercusión de los resultados de la investigación médica debido al número de sujetos que pueden participar en ellos. Para simplificar la ejecución de estos estudios, el proceso de intercambio de datos debería ser sencillo, por ejemplo, mediante el uso de bases de datos interoperables. Sin embargo, la consecución de esta interoperabilidad sigue siendo un tema de investigación en curso, sobre todo debido a los problemas de gobernanza y privacidad de los datos. En la primera fase de este trabajo, proponemos varias metodologías para optimizar los procesos de estandarización de las bases de datos sanitarias. Este trabajo se centró en la estandarización de fuentes de datos heterogéneas en un esquema de datos estándar, concretamente el OMOP CDM, que ha sido desarrollado y promovido por la comunidad OHDSI. Validamos nuestra propuesta utilizando conjuntos de datos de pacientes con enfermedad de Alzheimer procedentes de distintas instituciones. En la siguiente etapa, con el objetivo de enriquecer la información almacenada en las bases de datos de OMOP CDM, hemos investigado soluciones para extraer conceptos clínicos de narrativas no estructuradas, utilizando técnicas de recuperación de información y de procesamiento del lenguaje natural. La validación se realizó a través de conjuntos de datos proporcionados en desafíos científicos, concretamente en el National NLP Clinical Challenges (n2c2). En la etapa final, nos propusimos simplificar la ejecución de protocolos de estudios provenientes de múltiples centros, proponiendo soluciones novedosas para perfilar, publicar y facilitar el descubrimiento de bases de datos. Algunas de las soluciones desarrolladas se están utilizando actualmente en tres proyectos europeos destinados a crear redes federadas de bases de datos de salud en toda Europa.[Abstract] Multicentre health studies are important to increase the impact of medical research findings due to the number of subjects that they are able to engage. To simplify the execution of these studies, the data-sharing process should be effortless, for instance, through the use of interoperable databases. However, achieving this interoperability is still an ongoing research topic, namely due to data governance and privacy issues. In the first stage of this work, we propose several methodologies to optimise the harmonisation pipelines of health databases. This work was focused on harmonising heterogeneous data sources into a standard data schema, namely the OMOP CDM which has been developed and promoted by the OHDSI community. We validated our proposal using data sets of Alzheimer’s disease patients from distinct institutions. In the following stage, aiming to enrich the information stored in OMOP CDM databases, we have investigated solutions to extract clinical concepts from unstructured narratives, using information retrieval and natural language processing techniques. The validation was performed through datasets provided in scientific challenges, namely in the National NLP Clinical Challenges (n2c2). In the final stage, we aimed to simplify the protocol execution of multicentre studies, by proposing novel solutions for profiling, publishing and facilitating the discovery of databases. Some of the developed solutions are currently being used in three European projects aiming to create federated networks of health databases across Europe

    Garantia de privacidade na exploração de bases de dados distribuídas

    Get PDF
    Anonymisation is currently one of the biggest challenges when sharing sensitive personal information. Its importance depends largely on the application domain, but when dealing with health information, this becomes a more serious issue. A simpler approach to avoid this disclosure is to ensure that all data that can be associated directly with an individual is removed from the original dataset. However, some studies have shown that simple anonymisation procedures can sometimes be reverted using specific patients’ characteristics, namely when the anonymisation is based on hidden key attributes. In this work, we propose a secure architecture to share information from distributed databases without compromising the subjects’ privacy. The work was initially focused on identifying techniques to link information between multiple data sources, in order to revert the anonymization procedures. In a second phase, we developed the methodology to perform queries over distributed databases was proposed. The architecture was validated using a standard data schema that is widely adopted in observational research studies.A garantia da anonimização de dados é atualmente um dos maiores desafios quando existe a necessidade de partilhar informações pessoais de carácter sensível. Apesar de ser um problema transversal a muitos domínios de aplicação, este torna-se mais crítico quando a anonimização envolve dados clinicos. Nestes casos, a abordagem mais comum para evitar a divulgação de dados, que possam ser associados diretamente a um indivíduo, consiste na remoção de atributos identificadores. No entanto, segundo a literatura, esta abordagem não oferece uma garantia total de anonimato, que pode ser quebrada através de ataques específicos que permitem a reidentificação dos sujeitos. Neste trabalho, é proposta uma arquitetura que permite partilhar dados armazenados em repositórios distribuídos, de forma segura e sem comprometer a privacidade. Numa primeira fase deste trabalho, foi feita uma análise de técnicas que permitam reverter os procedimentos de anonimização. Na fase seguinte, foi proposta uma metodologia que permite realizar pesquisas em bases de dados distribuídas, sem que o anonimato seja quebrado. Esta arquitetura foi validada sobre um esquema de base de dados relacional que é amplamente utilizado em estudos clínicos observacionais.Mestrado em Ciberseguranç

    Clinical concept normalization on medical records using word embeddings and heuristics

    Get PDF
    Electronic health records contain valuable information on patients' clinical history in the form of free text. Manually analyzing millions of these documents is unfeasible and automatic natural language processing methods are essential for efficiently exploiting these data. Within this, normalization of clinical entities, where the aim is to link entity mentions to reference vocabularies, is of utmost importance to successfully extract knowledge from clinical narratives. In this paper we present sieve-based models combined with heuristics and word embeddings and present results of our participation in the 2019 n2c2 (National NLP Clinical Challenges) shared-task on clinical concept normalization.publishe

    A Recommender System to Help Refining Clinical Research Studies

    Get PDF
    [Abstract] The process of refining the research question in a medical study depends greatly on the current background of the investigated subject. The information found in prior works can directly impact several stages of the study, namely the cohort definition stage. Besides previous published methods, researchers could also leverage on other materials, such as the output of cohort selection tools, to enrich and to accelerate their own work. However, this kind of information is not always captured by search engines. In this paper, we present a methodology, based on a combination of content-based retrieval and text annotation techniques, to identify relevant scientific publications related to a research question and to the selected data sources.This work has received support from the EU/EFPIA Innovative Medicines Initiative 2 Joint Undertaking under grant agreement No 806968. JFS and JRA are funded by the FCT - Foundation for Science and Technology (national funds) under the grants PD/BD/142878/2018 and SFRH/BD/147837/2019 respectively.Portugal. Fundação para a Ciência e a Tecnologia; PD/BD/142878/2018Portugal. Fundação para a Ciência e a Tecnologia; SFRH/BD/147837/201

    Ferramenta de gestão de protocolos clínicos

    Get PDF
    Decision support systems are currently important tools to guide the clinician’s decisions and to help on the patient’s treatments. These systems have been studied over the last decades, leading to some well-defined best practices for building new solutions. This project had the objective of building a clinical decision system with a core engine based on predefined rules, which can be customized by end-users. This work had as main motivation the treatment of diabetic inpatients and outpatients, in hospital services others than endocrinology. To keep the solution generic, the system does not depend on any specific patient data, neither on the protocols. This application follows the client-server model. based on a microservice architecture, providing a modern web user interface. The project was carried out in a close collaboration with the Hospital Center of Baixo do Vouga, resulting in a solution that can assists health professionals in the treatment of patients, reducing errors and providing a better monitoring of health care services.Nos últimos anos, têm sido estudadas diversas metodologias para aumentar a qualidade da execução dos tratamentos oferecidos aos doentes hospitalizados. Foram igualmente desenvolvidos sistemas computacionais para auxiliar a tomada de decisões clínicas. O objetivo deste trabalho consistiu no desenvolvimento de uma aplicação web para apoiar a execução de tratamentos clínicos, seguindo regras previamente estabelecidas. Estas regras constituem as premissas base que definem o procedimento a aplicar, ou seja, a estrutura do protocolo clínico. Este trabalho teve como principal motivação o tratamento de pacientes com diabetes que são internados ou atendidos em serviços hospitalares não especializados nesta doença. Contudo, para não limitar a sua aplicação a um cenário específico, a solução foi pensada para ser flexível e ser aplicável em qualquer cenário clínico. Esta aplicação segue o modelo cliente-servidor. com base numa arquiteture de microserviços, fornecendo uma interface de utilizador web moderna. O projeto decorreu em estreita colaboração com o Centro Hospitalar do Baixo do Vouga, tendo como resultado uma solução que auxilia os profissionais de saúde no tratamento de doentes internados, reduzindo o risco de erros e aumentando o controlo e monitorização dos cuidados de saúde.Mestrado em Engenharia de Computadores e Telemátic

    Enhancing Decision-making Systems with Relevant Patient Information by Leveraging Clinical Notes

    Get PDF
    [Abstract] Hospitalised patients suffering from secondary illnesses that require daily medication typically need personalised treatment. Although clinical guidelines were designed considering those circumstances, existing decision-support features fail in assimilating detailed relevant patient information, which opens up opportunities for systems capable of performing a real-time evaluation of such data against existing knowledge and providing recommendations during clinical treatments. In this paper, we present a proposal for a new feature to integrate with electronic health record (EHR) systems that enriches the health treatment process by automatically extracting information from patient medical notes and aggregating it in clinical protocols. Our goal is to leverage the historical component of the patient trajectory to improve clinical decision support systems performance.EU/EFPIA Innovative Medicines Initiative 2 Joint Undertaking; 806968NETDIAMOND project; POCI-01-0145-FEDER-016385Foundation for Science and Technology; PD/BD/142878/2018Foundation for Science and Technology; SFRH/BD/147837/201

    Discovery of biomedical databases through semantic questioning

    Get PDF
    Many clinical studies are greatly dependent on an efficient identification of relevant datasets. This selection can be performed in existing health data catalogues, by searching for available metadata. The search process can be optimised through questioning-answering interfaces, to help researchers explore the available data present. However, when searching the distinct catalogues the lack of metadata harmonisation imposes a few bottlenecks. This paper presents a methodology to allow semantic search over several biomedical database catalogues, by extracting the information using a shared domain knowledge. The resulting pipeline allows the converted data to be published as FAIR endpoints, and it provides an end-user interface that accepts natural language questions.This work has received support from the EU/EFPIA Innovative Medicines Initiative 2 Joint Undertaking under grant agreement No 806968. AP and JRA are funded by the FCT - Foundation for Science and Technology (national funds) under the grants PD/BD/142877/2018 and SFRH/BD/147837/2019 respectively.info:eu-repo/semantics/publishedVersio

    A Recommender System Based on Cohorts’ Similarity

    Get PDF
    [Abstract] Aiming to better understand the genetic and environmental associations of Alzheimer's disease, many clinical trials and scientific studies have been conducted. However, these studies are often based on a small number of participants. To address this limitation, there is an increasing demand of multi-cohorts studies, which can provide higher statistical power and clinical evidence. However, this data integration implies dealing with the diversity of cohorts structures and the wide variability of concepts. Moreover, discovering similar cohorts to extend a running study is typically a demanding task. In this paper, we present a recommendation system to allow finding similar cohorts based on profile interests. The method uses collaborative filtering mixed with context-based retrieval techniques to find relevant cohorts on scientific literature about Alzheimer's diseases. The method was validated in a set of 62 cohorts.National Science Foundation (Portugal); POCI-01-0145-FEDER-01638

    Methodology to identify a gene expression signature by merging microarray datasets

    Get PDF
    A vast number of microarray datasets have been produced as a way to identify differentially expressed genes and gene expression signatures. A better understanding of these biological processes can help in the diagnosis and prognosis of diseases, as well as in the therapeutic response to drugs. However, most of the available datasets are composed of a reduced number of samples, leading to low statistical, predictive and generalization power. One way to overcome this problem is by merging several microarray datasets into a single dataset, which is typically a challenging task. Statistical methods or supervised machine learning algorithms are usually used to determine gene expression signatures. Nevertheless, statistical methods require an arbitrary threshold to be defined, and supervised machine learning methods can be ineffective when applied to high-dimensional datasets like microarrays. We propose a methodology to identify gene expression signatures by merging microarray datasets. This methodology uses statistical methods to obtain several sets of differentially expressed genes and uses supervised machine learning algorithms to select the gene expression signature. This methodology was validated using two distinct research applications: one using heart failure and the other using autism spectrum disorder microarray datasets. For the first, we obtained a gene expression signature composed of 117 genes, with a classification accuracy of approximately 98%. For the second use case, we obtained a gene expression signature composed of 79 genes, with a classification accuracy of approximately 82%. This methodology was implemented in R language and is available, under the MIT licence, at https://github.com/bioinformatics-ua/MicroGES.info:eu-repo/semantics/publishedVersio
    corecore